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Abstract — The entropy power inequality (EPI) provides lower 
bounds on the differential entropy of the sum of two indepen- 
dent real-valued random variables in terms of the individual 
entropies. Versions of the EPI for discrete random variables 
have been obtained for special families of distributions with 
the differential entropy replaced by the discrete entropy, but 
no universal inequality is known (beyond trivial ones). More 
recently, the sumset theory for the entropy function provides a 
sharp inequality H(X + X') - H{X) > f - o(l) when X,X' 
are i.i.d. with high entropy. This paper provides the inequality 
H(X + X') - H(X) > g(H{X)), where X, X' are arbitrary 
i.i.d. integer-valued random variables and where g is a universal 
strictly positive function on R+ satisfying g(0) = 0. Extensions to 
non identically distributed random variables and to conditional 
entropies are also obtained. 

Index Terms — Entropy inequalities, Entropy power inequal- 
ity, Mrs. Gerber's lemma, Doubling constant, Shannon sumset 
theory. 

I. Introduction 

For a continuous random variabld^X on E™, let h(X) be 
the differential entropy of X and let N(X) = 2" /i(x) denote 
the entropy power of X. If X and Y are two i.i.d. continuous 
random variables over K™, the EPI states that 



N(X + Y)> N(X) + N(Y), 



(1) 



with equality if and only if X and Y are Gaussian with 
proportional covariance matrices. A weaker statement of the 
EPI, yet of key use in applications, is the following inequality 
stated here for n = 1, 



h(X + X') - h{X) > 



1 



(2) 



where X, X' are i.i.d., and where equality holds if and only 
if X is Gaussian. 

The EPI was first proposed by Shannon [TT] who used a 
variational argument to show that Gaussian random variables 
X and Y with proportional covariance matrices and specified 
differential entropies constitute a stationary point for h(X + 
Y). However, this does not exclude saddle points and local 
minima. The first rigorous proof of the EPI was given by Stam 
p) in 1959, using the De Bruijin's identity which connects 
the derivative of the entropy with Gaussian perturbation to 
the Fisher information. This proof was further simplified by 
Blachman [3|. Another proof was proposed by Lieb [4] based 
on an extension of Young's inequality. 

'All continuous random variables are assumed to have well-defined differ- 
ential entropies. 



While there is a wide range of inequalities involving union 
of random variables, the EPI is the only general inequality 
in information theory estimating the entropy of a sum of 
independent random variables by means of the individual 
entropies. It is used as a key ingredient to prove converse 
results in coding theorems |8|-[12|. 

There have been numerous extensions and simplifications 
of the EPI over the reals (6), Q, (I3j-|(2TJ. There have also 
been several attempts to obtain discrete versions of the EPI, 
using Shannon entropy. Of course, it is not clear what is meant 
by a discrete version of the EPI, since ([TJ, |2) clearly do no 
hold verbatim for Shannon entropy. 

Several extensions have yet been developed. First, there 
have been some extensions using finite field additions, for 
example, the so-called Mrs. Gerber's Lemma (MGL) proved 
in (23) by Wyner and Ziv for binary alphabets. The MGL 
was further extended by Witsenhausen p4| to non binary 
alphabets, who also provided counter-examples for the case 
of general alphabets. More recently, [28 1 obtained EPI and 
MGL results for abelian groups of order 2™. Second, con- 
cerning discrete random variables and addition over the reals, 
Harremoes and Vignat [25] proved that the discrete EPI holds 
for binomial random variables with parameter |, which later 
on was generalized by Sharma, Das and Muthukrishnan |26|. 
Yu and Johnson |27| obtained a version of the EPI for discrete 
random variables using the notion of thinning. 

More recently, Tao established in [29| a sumset theory for 
Shannon entropy, obtaining in particular the sharp inequality 

H{X + X')-H{X)> l --o(l), 

where o(l) vanishes when H(X) tends to infinity. Further 
results were obtained for the differential entropy in [ 30 1 . 

In this paper, we are interested in integer-valued random 
variables with arithmetic over the reals. We show that there 
exists an increasing function g : W.+ — > M + , such that g(x) — 
if and only if x = 0, and 

H(X + X')-H(X) >g(H(X)), 

for any i.i.d. integer-valued random variables X, X' . Although 
we have provided an explicit characterization of g, we found 
that proving the existence of such a function (even without 
explicit characterization) is equally challenging. We further 
generalize the result to non identically distributed random 
variables and to conditional entropies. We also discuss some 



open problems in Section IV in particular, a closure convexity 



conjecture which would strengthen the conditional entropy 
result. 

The results obtained in this paper were used in ]22| to prove 
a polarization coding result for discrete random variables using 
Hadamard matrices over the reals. 

Notation: The set of integers and reals will be denoted 
by TL and R. Similarly, TL + and R + will denote the set of 
positive integers and positive reals. We will use large letters 
for random variables and small letters for their realizations 
(the random variable X can have realization x). The natural 
logarithm and the logarithm in base 2 will be denoted by In 
and log 2 respectively and for x € [0, 1], h 2 (x) = —x\og 2 (x) — 
(1— x) log 2 (l— x) will denote the binary entropy function with 
the convention that 01og 2 (0) = 0. The entropy of a discrete 
random variable X in base 2 (bits) will be denoted by H{X). 
We will interchangeably use H (p) or H(X), where p is the 
probability distribution of X. The conditional entropy of a 
random variable X given another random variable Y will be 
denoted by H(X\Y). For a, b £ R, we will use aWb and aAb 
for the maximum and minimum of a and b. Also a + = a V 
will denote the positive part of a. 

II. Results 

In this section, we will give an overview of the results 
proved in the paper. The first theorem gives a lower bound 
on the entropy gap of sum of two i.i.d. random variables as a 
function of their entropies. 

Theorem 1. There is a function g : R + — > R + such that 
for any two i.i.d. TL-valued random variables X,X' with 
probability distribution p, 

H(p*p)-H(p)>g(H(p)). 

Moreover, g is an increasing function, lim c _ i . 00 g(c) = 
| log 2 (e) and g(c) — if and only if c = 0. 

Remark 1. The function g in Theorem [T] is given by 

g(c) = min {(cx — h 2 (x)) V 

x£[0,l] 

{l-x) 2 ((l-x)V (4x-2)+) 2 
81n(2) 

Remark 2. As we mentioned in the introduction, a recent 
result by Tao |29| implies that for a discrete Z-valued random 
variable of very large entropy H(p * p) — H(p) rj \. In 
comparison with this result, we only get an asymptotic lower 
bound of |log 2 (e) « 0.18. We will see later that, the 
asymptotic lower bound 0.18 is also valid for independent 
but not necessarily identically distributed random variables 
provided that the entropy of both random variables approaches 
infinity. 

The next theorem extends the i.i.d. result to the general 
independent case. 

Theorem 2. There is a function g : R^ — > R + such that for 
any two independent TL-valued random variables X, X' with 



probability distributions p\ , p 2 , 

H(p 1 ,p 2 )- Hi ^ H ^>g(H(p 1 ),H( P2 )). 

Moreover, g is a positive and doubly-increasin^function of its 
arguments, lim( C:d )_j. (o0:00 ) g(c, d) = §log 2 (e) and g(c,d) = 
if and only if c — d — 0. 

Remark 3. One might be tempted to prove the stronger bound 

H( Pl *p 2 ) - m ax {H( Pl ) 1 H(p 2 )} > g{H{ Pl ), H(p 2 )), (3) 

for some doubly-increasing function g. However, this fails 
because, for example, assume that p\,p 2 are uniform dis- 
tributions over {1,2,..., M} and {1, 2, . . . , NM}, for some 
number N > 2. It is not difficult to show that 

N + 1 

H(jpi*p 2 ) -m&x{H{ Pl ),H{p 2 )} < \og 2 {^—), 

which decreases to with increasing N. Therefore, there is no 
hope to get a stronger result as in Q, which holds universally 
for all distributions. 

The next theorem extends the results in Theorem Q] to the 
conditional case. 

Theorem 3. There is a function g : R + — > R + such that for 
any two i.i.d. TL-valued pairs of random variables {X, Y) and 
(X',Y'), 

H(X + X'\Y, Y') - H{X\Y) > g(H(X\Y)). 

Moreover, g : R + —¥ R+ is an increasing function and g(c) — 
if and only if c = 0. 

Remark 4. The function g is given by 

3(c) = min {(g(c,c)-h 2 (S)) V 5 2 g{c,c)} 1 (4) 
fie[o,§] 

where g is as in Theorem [2] and h 2 {5) is the binary entropy 
function. 

III. Proof techniques 

In this part, we will try to give an overview and also some 
intuition about the techniques used for proving the theorems. 

A. EPI for i.i.d. random variables 

We will start from the EPI for i.i.d. random variables. The 
main idea of the proof is to find suitable bounds for H(p* 
p) — H(p) in two different cases: one case in which p is a 
spiky distribution, namely, there is an i E TL such that pi is 
substantially high, and the other case where p is a quite flat and 
non-spiky distribution and then to combine these two bounds 
together. 

Lemma 1. Assume that p is a probability distribution over TL 
with H(p) = c and let x = ||p||oo. Then 

H(p*p) — c> cx — h 2 (x), 

2 A function g : — > R+ is doubly-increasing if for any value of one of 
the arguments, it is an increasing function of the other argument. 



where /i 2 i s the binary entropy function. 

Proof: In appendix [A] ■ 

Remark 5. Notice that Lemma [T] gives a very tight bound for 
spiky distributions for which ||p|| oo is ver Y close to 1, namely, 
for H(p) = c, we get H(p*p) — c ~ c, which is the best we 
can hope. 

The next step is to give a bound for non-spiky distributions. 
The main idea is that in this case, it is possible to decompose 
the probability distribution p into two different parts p\,p 2 
with disjoint non-interlacing supports such that p*pi and p* 
P2 are sufficiently far apart in l\ -distance. We formalize this 
through the following lemmas. 

Lemma 2. Let c > 0, < a < | and n £ Z. Assume that p 
is a probability measure over Z such that a < p((— oo, n]) < 
1 — a and H{p) = c, then 

\\p*Pl -p*P2||i > 2a, 

Where Pi = p((-L,n]) Pl (-«».»] md = p([n+l,oo)) Pl["+l,°°) 

are scaled restrictions of p to (—00, n] anJ [n + 1, cxd) 

Proof: In appendix |A| ■ 

Lemma 3. Assume that pi, p 2 and p are arbitrary probability 
distributions over Z such that p\ and p 2 have non-overlapping 
supports and \\p\\oo = x - Then 

\\p*Pl -JJ*JJ 2 ||l > 2(2ir - 1) + . 

Proof: In appendix |A| ■ 

Lemma 4. Assuming the hypotheses of Lemma [2] 



H{p-kp) — c > 



21n(2) 



Proof: In appendix |A| 



Lemma 5. Assume that p is a probability distribution over Z 
w/f/z = c ant/ ||p||(x> = x. Then 

H(j, * p) - c - ((1 - ^ v {Ax 2)+)2 - 

Proof: In appendix |A] ■ 

Now that we have the required bounds in the spiky and 
non-spiky cases, we can combine them to prove Theorem [T] 

Proof of Theorem [TJ Assume that p is a probability 
distribution over Z with H(p) = c and ||f>||oo = x. It is easy 
to see that x > 2~ c . Also setting a = there is an integer 
n such that a < oo, nj) < 1 — a. Using Lemma [T] and 
Lemma [5] it results that H(p*p) — c > 1(c), where 

1(c) = min {(cx — h,2{x)) V 

(l-x) 2 ((l-a;)V(4a;-2)+) 2 
81n(2) ^' 



}• 



We will use a simpler lower bound given by 

g(c) = min {(ex — /^(aO) V 

ase[Q,l] 

(1-x) 2 ((1-x)V(4.t-2)+) s 
81n(2) 

where obviously 1(c) > g(c). It is easy to check that g(c) 
is a continuous function of c. The monotonicity of g follows 
from monotonicity of cx — h-i(x) with respect to c, for every 
x € [0, 1]. For strict positivity, note that (1 — x) 2 ((l — x) V 
(4x — 2) + ) 2 is strictly positive for x 6 [0, 1) and it is when 
x = 1, but lmxr-yi cx—h,2(x) — c. Hence, for c > 0, g(c) > 0. 
If c = then 

«-w "'"'" 1 ^ (< " W ) 

(l-x) 2 ((l-a;)V(4a ; -2) + ) 2 



81n(2) 



and its minimum over [0, 1] is 0. 

For asymptotic behavior, notice that at x = 0, cx—li2(x) — 

8 CTgN . Hence, from continuity, 



Uand 8M2) 

it results that g(c) < 8l ^ 2 ) ^ or an y c — ^l so f° r an Y 
< e < | there exists a c such that for every c > Cq and 
every x, e < x < 1, cx — h 2 (x) > g A 2 , , Thus for any e > 
there is a Co such that for c > Co, the outer minimum over x 
in the definition of g(c) is achieved on [0, e], which is higher 

( 1 — e") 4 

than 81n (2) ■ This implies that for every e > 0, 

a , > Hm sup g(c) > liminf g(c) > ^ ,j! , 
81n(2) c _).oo c ^°° 81n(2) 



and lim r 



5(c) 



81n(2) ' 



Figure [TJ shows the EPI gap. As expected, the asymptotic 
gap is |log 2 (e) w 0.18. 



EPI gap lor di 




Fig. 1: The EPI gap for discrete random variables over Z 

B. EPI for non-i.i.d. random variables 

Theorem [2] is an extension of Theorem [T] to independent 
but non identically distributed random variables. Similar to 
the i.i.d. case the idea is to distinguish between the spiky and 
non-spiky distributions. 



Lemma 6. Assume that p and q are two probability distri- 
butions over Z with H(p) = c and H(q) — d. Suppose that 

x = |p||oo and y = ||q||oo- Then, 

2H(p -k q) — c — d > dx — h 2 (x) + cy — h 2 (y) 1 (5) 
where h 2 is the binary entropy function. 

Proof: In appendix [5] ■ 
When at least one of the distributions is spiky, Lemma [6] 
gives a relatively tight bound. Hence, we should try to find a 
good bound for the non-spiky case. 

Lemma 7. Let p, q be two probability distributions over Z. 
Assume that there are < a, (3 < | and m, n € Z such that 
a < p(( — °°! m D < 1 ~~ a an d P < (/((— 00, n]) < 1 — /3. 77zen 

||g*Pi - ?*P2||i + \\p*Qi ~P*Q2\\i > 2(a + 0), 



where p\ 



p((- 



^Pl(-oo,m]> P2 — p([ w ,+i ) oo))Pl[m+l,oo)» 
91 = 9 ((_« )tt ])Q , |(-oo 1 tl]. 12 = g ([„ + 1 i, 00 ))9l[n+l,oo)- 

Proof: In appendix [B] ■ 

Lemma 8. Assume that the hypotheses of Lemma^hold and 
let H(p) — c and H(q) = d. Then 



H(p*q) -d> 
H(p*q) — c > 



2m(2) 
21n(2) 

Proof: Proof in appendix IB] 



|9*Pi -S*P2||i, 



|P*<7i -P*Q2\\v 



Lemma 9. Lef p and q be probability distributions over Z 
w/f/z -ff(p) = c, -ff(g) = d, ||p||oo = x and \\q\\oo — U- Then 



where 



2H(p *q) - c — d> l(x, y), 

{l-xfa 2 + (l-y) 2 b 2 



l(x, y) — min 

(a,b)GT(x,y) 



Un(2) 



and T(x, y) is a subset of (a, b) € parameterized by 
(x, y) S [0, 1] x [0, 1] and given by the following inequalities 

a>(4y- 2) + ,b > {Ax - 2)+, a + b>2-x-y. 

Moreover, l(x, y) is a continuous function of (x, y), l(x, y) > 
and l(x, y) = if and only if (x, y) = (1, 1). 

Proof: Proof in appendix [B] ■ 

Proof of Theorem |2| Let 

x — IIpIIoo an d y — 1 1 1 1 00 ■ it is 

easy to check that x > 2~ c ,y > 2~ d . Using Lemma [6] and 
Lemma |9j we obtain that 

c + d 

H(p*q) - — > s(c,d), 
where s(c,d) is given by 

i min {(dx - h 2 (x) + cy - h 2 {yj) V l(x,y)\, 

I (x,y)£R(c,d) 



for R(c, d) — [2 c , 1] x [2 d , 1]. We will use a simpler lower 
bound given by 

g(c, d) 



2 (x,y)£R 



where R = [0, 1] x [0, 1]. It is easy to see that g(c,d) is a 
continuous function. It is also a doubly increasing function of 
its arguments. To prove the last part, notice that the l(x,y) 
in the definition of g is strictly positive except for (x*,y*) = 
(1, 1). But lim( a . iy )_ > ( 1)1 ) dx - h 2 (x) + cy - h 2 (y) = c + d, 
which is strictly positive unless c = d = 0. Therefore, for 
(c,d) ^ (0,0), g(c,d) > 0. 

The function dx — h 2 (x) + cy — h 2 (y) is an increasing 
function of (c, d) over R, which implies that g(c, d) must be an 
increasing function of (c, d). Also, using an argument similar 
to what we had in the proof of Theorem [T] it is possible to 
show that for high values of c and d, the outer minimum in 
the definition of g is achieved in a small enough neighborhood 
of (0, 0), namely, [0, e] x [0, e] for some small enough e > 0. 
From the continuity of l(x,y), it can be shown that in this 
range the value of l(x,y) is very close to 



(a,b):a,b>0,a+b>2 81ll(2) 

This implies that 



41n(2)' 



lim q(c,d) — — — r^r- 
( c ,d)^(oo,oo) yv ; 81n(2) 

This completes the proof of the EPI result for the general 
independent case. 

C. Conditional EPI 

In this part, we will prove the EPI result for the conditional 
case, where we try to find a lower bound for the conditional 
entropy gap, H(X + X'\Y, Y') — H(X\Y), for i.i.d. Z-valued 
pairs (X,Y) and (X',Y') assuming that H(X\Y) = c, for 
some positive number c. Notice that as Y and Y' only appear 
in the conditioning, we do not lose generality by assuming 
them to be Z-valued. Let us denote the probability distribution 
of Y by q then the conditional entropy gap can be written as 



^2 ^1i H (Pi *Pi) 



where p, is the conditional distribution of X given Y = i. 

Notice that we are interested to the infimum of this gap 
over all possible q,pi satisfying Y^iezQiHiPi) = c - E yen if 
the minimizing q exists, it may not be finitely supported and 
in general, finding the corresponding gap requires an infinite 
dimensional constrained optimization. 

To cope with this problem, we will show that it is possible 
to restrict the support size of q to 2 provided that instead of 
the i.i.d. case we consider the general independent and non 
identically distributed one. Of course, at the end we get a 
looser bound at the price of simplifying the problem. 

To be more specific, let (X, Y) and (X' , Y') be independent 
Z-valued pairs with H(X\Y) = H(X'\Y') = c and let 



t n (c) be the infimum of H(X + X'\Y,Y') - c over all 
(X, Y), (X' , Y') having a conditional entropy equal to c with 

Y and Y' having a support size at most n. Also, assume 
that too(c) is the corresponding infimum when there is no 
constraint on the support size. We first prove the following 
lemma. 

Lemma 10. For every n>2, ioo(c) = t n (c). 

Proof: Obviously, t n (c) > too(c). Moreover, given any 
e > there is an e-optimal independent pair (X, Y) and 
(X',Y') such that 

H(X + X'\Y,Y') -c<too(c) + €. 

Let q, q' denote the distribution of Y, Y' and let Pi,p'j be the 
conditional distribution of X, X' given Y = i, Y' = j. Let 

V = {w y e R 3 : vy = (H( Pl *pJ),ff(pi),F(pJ)), i,j e Z}. 
It is easy to see that 



E 



which implies that the three dimensional vector h := (H(X + 
X'\Y,Y'), c, c) can be written as a convex combinations of the 
vectors Vij £ V with weights qicfe. Let Vi — J^j Qj v ij- Then 
we have J^. q^<Uj = ft,. Notice that the second component 
of Vi is equal to H(pi). Also, the third component is equal 
to c independent of i, which implies that there are only two 
components depending on i in Vi. Therefore, by Caratheodory 
theorem, it is possible to write h as a convex combination 
of at most three t>i,i € Z, which without loss of generality, 
we can assume to be {v ,v 1 ,v 2 }- In other words, there are 
positive -fi,i = 0,1,2, X)i=o7» = 1 and h = Yn=ali v i- 
Also, note that if we change the distribution of Y from q to 7, 
the resulting (X,Y),(X' ,Y') is again an e-optimal solution. 
Now, we claim that we can simplify the problem further and 
find a probability triple ijj — (ipo, tpi,ip 2 ) with at most 2 non- 
zero elements such that J>2 i=0 ^iH (p^ = c and at the same 
time 



i=0 



< 



2 2 

^HvP ="£ qi vP=H(X + X'\Y,Y% 



i=0 



i=0 



where denotes the first coordinate of the vector Vj,. This 
implies that if we replace the distribution 7 for Y by ip, which 
has a support of size 2, we get a lower H(X + X'\Y,Y'), 

To prove the claim, let us consider the following optimiza- 
tion problem 



2 

minimize s.t. 

i=0 



fa > 0. 



First of all, notice that as Y^i=oliH{Pi) = c, 7 is in the 
feasible set. Therefore, the feasible set is a non-empty subset 
of the three dimensional probability simplex. Also, as the 
objective function is linear in ip, the optimal point must be 



at the edge of the feasible set which implies that there is an 
optimal solution with at most two non-zero components and 
this proves the claim. 

By symmetry, we can apply the same argument to the 
probability distribution q' of Y' to get an e-optimal solution 
in which the support of both q and q' has at most size 2. 
Hence, this implies that for any e > and any n > 2, 
t n (c) < t2(c) < 1 00(c) + e. In other words, t n (c) — too(c). 
This completes the proof. ■ 



Lemma 10 allows us to simplify finding the lower bound. 
However, we might get a looser bound because we relaxed the 
condition that (X, Y) and (X',Y') be identically distributed. 
From now on, we will assume that Y and Y' are binary valued 
random variables. We will use the following two lemmas to 
get a lower bound for the conditional entropy gap. 

Lemma 11. Let (X,Y), (X' ,Y') be an independent pair of 
random variables, where Y and Y' are binary valued with 
P(Y = 0) = a, ¥(Y' = 0) = P and H(X\Y) = H(X' = 
Y') = c. Then 

H(X + X'\Y, Y')-c> g(c, c) - min{h 2 (a), h 2 (p)}, 

where g is the same function as in Theorem |2] 

Proof: Proof in appendix [C] ■ 

Lemma 12. Assume that all of the conditions of Lemma [77] 
hold. Suppose there is a < 8 < | such that 5 < a, P < 1 — 5. 
Then 

H(X + X'\Y,Y')-c> 5 2 g(c,c). 

Proof: Proof in appendix [C] ■ 

Proof of Theorem [3} The proof follows by combining the 
results obtained in Lemma [IT] and 12 Let S = minjo, 1 — 
a, P, 1 — P}. Then < 6 < \ and using Lemma M2I we 
get the lower bound 8 2 g(c, c). Similarly, from LemmajTT] and 
using the fact that min{h2(a) , h 2 (P)} = h 2 (S), we get the 
lower bound g(c,c) — h 2 (S). Combining the two, we obtain 
the desired lower bound 



5(c) = min 
«e[o,a 



{(g(c,c)-h 2 (S)) V S 2 g(c,c)}. 



The monotonicity of g follows from the monotonicity of 
g(c,c). Also, notice that S 2 g(c, c) is strictly positive unless 
5 = but lims^o g(c,c) — h 2 (5) = g(c,c), which is strictly 
positive if c > 0. Therefore, for c > we have g(c) > 0. This 
completes the proof. 

IV. Open problems 

A. Closure convexity of the entropy set % 

As we saw in the proof of Theorem [3] the conditional 
EPI does not directly follow from the unconditional one. In 
particular, we had to relax the i.i.d. condition in order to get a 
relatively weak lower bound. In this part, we propose another 
approach to the problem which uses the closure convexity of 
the entropy set as we will define in a moment. 



Definition 1. The entropy set H is defined as follows 

H := {(H(p*q),H(p),H(q)) eM.\ : 

p, q are probability distributions over Z}. 

Remark 6. Notice that multiple (p, q) pairs may be mapped to 
the same point in H space. For example, if (p, q) is mapped 
to a point v E H, then any distribution (p, q) in which p and 
q are shifted versions of p and q is also mapped to v. 

Remark 7. Some of the boundaries of the set H trivially follow 
from the properties of the entropy, i.e., for any v E H, 

V (D > V W )V W > f> (3) ) 

where t>w denotes the i-th coordinate of the vector v. Also 
the boundary v^- 1 ' = v^+v^ 3 ' is achievable. To show this, let 
^(2) i;(3) g ]j + an( j consider two finite support distributions 
p and q of support {0, 1, ... , M - 1} and {0, 1, . . . , N - 1} 
for appropriate M and N such that H (p) — v 1 - 2 ^ and H(q) = 
i/ 3 '. Now, fix p and define a new distribution q as follows 

It is not difficult to show that H(q) = H(q) = and 
H(p*q) = H{p)+H(q) = W 3) . 

We propose the following conjecture about the set W. 

Conjecture 1. The closure of the set % is convex. 

Using this conjecture, we can prove the following lemma, 
which is a stronger version of the conditional EPI. 

Theorem 4. Assume that Conjecture [7] holds. Let (X, Y) and 

(X',Y f ) be independent pairs of It-valued random variables 
with H(X\Y) = c,H{X'\Y') = d. Then 

H(X + X'\Y,Y')- C -^>9(c,d), 
where g is the same function as in Theorem [2] 

Proof: Let us assume that the distribution of Y, Y' is 
q, q' respectively. Also assume that Pi,p'j is the distribution of 
X, X' when Y = i, Y' = j. Let 

Vij = {H{p i -kp l j ),H{p i ),H{p' j )), i,j E Z. 

Notice that Va E H. We also have 



In particular, this implies that 

H(X + X'\Y,Y') - - 



m = 



(H(X + X'\Y,Y'),c, d) 



j "13 , 



which is a convex combination of the vectors v^j. By the 
closure convexity of H, for any e > it is possible to find 
an h E H in e-neighborhood of (H(X + X'\Y, Y'),c, d). In 
other words, for the given e > 0, there are two distributions 
fix, fi2 over Z such that 

if (Mi * A*a) - e < H ( x + x '% Y ') < Hfai * Ma) + e, 
H(jn)-e<c<H{fn) + e, 
H{/j, 2 ) - e < d < H((i 2 ) + e. 



2 

c + d 



> H(fj, 1 */x 2 ) - 

>g(H( fH ) ) H(n 2 ))-2e 
>g(c-e,d-e)-2e, 

where we used the monotonicity of g with respect to both 
arguments. As e > is arbitrary and g is a continuous function, 
it results that H{X + X'\Y, Y') - ^ > g(c, d). ■ 

Remark 8. In the case that (X, Y) and {X' , Y') are i.i.d. pairs 
with H(X\Y) = H(X'\Y') = c, this result reduces to 

H(X + X'\Y,Y')-c>g(c,c), 

which is tighter than the bound |4]) obtained in Theorem [3] 
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Appendix A 

EPI FOR I.I.D. RANDOM VARIABLES 

Proof of Lemma Q} Assume that X is a Z-valued 
random variable with probability distribution p. Let i 6 Z 
be such that p(i) — \\p\\oo — %■ Let pi be the probability 
distribution p shifted by i, i.e., Pi(k) — p(k — i) for every 
k e Z. Assume that P := pi. Note that H(p*p) = H(P-kP) 
and H(P) = H(p) = c. Let B be a binary random variable 
with ¥{B = 0} = x = 1 - ¥{B = 1}, and let R be a 
random variable defined by P{R = k} = pi(k)/(l — x) 
for every k e Z \ {0} and ¥{R = 0} = 0. Note that 
X = BR for independent B and R. We also have H(X) = 
h.2{x) + (1 — x)H(R). Let X' be an independent copy of X. 
Then, we have 

H(P*P) = H(BR + X') 

> H(BR + X'\B) 

= xc + (1 - x)H {X' + R) 

> xc+(l-x)H(R) 
— xc + c — h,2{x). 

This yields H(p-kp) — c > xc — ti2(x). ■ 

Proof of Lemma |2j Let ai — p((— oo,n]) and a 2 = 
p([n + l,oo)) = 1 — ct\. Note that p — ot\p\ + a 2 p 2 . We 
distinguish two cases ct\ <\ and ol\ > 5. If ai < \ then we 



have 

\\P*Pl ~P*P2\\ 

= \\aipi *pi - (1 - ai)p 2 *P2 + (1 - 2ai)pi *p 2 ||i 

> *pi-(l- a%)p 2 *P2||i - (1 - 2ai)||pi *p 2 ||i 
= at + (1 - ai) - (1 - 2ai) = 1a x > 2a, 

whereas if ax > \ we have 

\\P*Pl ~P*P2\\ 

= \\aipx *pi - (1 - ai)p 2 *P2 + (1 - 2ai)pi *p 2 ||i 

> \\aipx *pi - (1 - ai)p 2 *P2\\l - (2ai - l)\\pi *p 2 \\i 
= at + (1 - ai) - (2ai - 1) = 2(1 - ai) > 2a, 

where we used the triangle inequality, 1 — a\ > a and the 
fact that pi -kpi and p 2 *p 2 have non-overlapping supports, so 
the ^i-norm of the sum is equal to sum of the corresponding 
l\ -norms. ■ 

Proof of Lemma [3j Let n € Z be such that p(n ) = 
Iblloo = %■ We have 

\\p-kpi -p*P2\\\ = ^2 \p*Pi(i) ~P*P2{i)\ 

= l^PWfai* - s) -Mi - i))l 

> y]p(n )\pi(i - n ) -P2(i - m)\ 

P(3')\Pi{i ~ 3) ~ Mi ~ 3)\ 

= x\\pi -P2H1 - (1 - -P2II1 
= 2(2x- 1), 

where we used the fact that p\ and p 2 have non-overlapping 
supports thus ||pi -p 2 ||i = |bi||i + |b 2 ||i = 2. As \\p*pi - 
P*P2\\i > 0, we have — p*p 2 ||i > 2(2a; — 1) + . ■ 

Proof of Lemma |4f Let a\ and a 2 be the same as in 
the proof of Lemma [2] Let V\ = p\ -k p, V2 — P2 * P, and for 

x e [0, 1], define fi x — xv\ + (1 — x)v 2 and f(x) — H(/i x ). 
We have 

f'( x ) = - X/^ 1 * ~ V2 ^ lo S2(Mxi), 

Therefore, f(x) is a concave function of x. Moreover, 

/'(0)= D{u 1 \\y 2 )+H{y 1 )-H{u 2 ), 
f{l) = -D{v 2 \\v l )+H{v l )-H{v 2 ). 

Since pi and p 2 have different supports, there are i,j such 
that vu — 0,V2i > and v X j > 0, v 2 j = 0. Hence £ ) (i / i||^ 2 ) 
and D(v2\\vi) are both equal to infinity. In other words, 

/ / (0)=+oo, /'(!) = -00. 



Hence, the unique maximum of the function / must happen 
between and 1. Assume that for fixed v\ and v 2 , x* is the 
maximizer. If < a± < x* then 

uif (ai) = ^ai(i/ 2i - v xi ) \og 2 {p, aii ) > 0, 

which implies that 

f(ai) = -~^2n ail \og 2 (n ail ) 

= - ^{^2i + U\{v u - V 2l )) log 2 (Maii) 

= H{u 2 ) + D{v 2 \\n ai ) 



>H(p) 
= H(p) 



21n(2) 



11^2 - Mai 111 



"1 II 1 1 2 

Wl ~ ^2 111, 



21n(2)' 

where we used Pinsker's inequality for distributions r and s, 

1 



D(r\\s) > 



-\\r-s\\i. 



21n(2)' 

Similarly, we can show that if x* < oi\ < 1 then 

(l-«i) 2 



f{a 1 )>H{p) 



\V\ - 1*2 ||l- 



21n(2) 

As a < «i < 1 — a and a < | it results that 

H(p*p) = H{aip*pi + (1 - «i)p*p 2 ) 
= /(«i) 



>#(p) 



> c + 



21n(2) 



kl ~ ^2 1 1 1 



1^1 - 2*2||l- 



1-x 
2 



21n(2) 

Proof of Lemma |5j Let x = \\p\\oo an d a = 

It is easy to show that there is an n G Z such that a < 
p((— oo, n]) < 1 — a. Also let p\ and p2, as in Lemma [2] be 
the restriction of p to (— oo, n] and [n + 1, oo). As p\ and p 2 
have disjoint supports, using Lemma [2] and [3] it results that 

\\p* Pl -p* P2 \\i > (1 - x) V (4b - 2)+, 
Therefore, using Lemma [4j we get 

ff(p * p) " C " fe^f ((1 " X) V ^ ~ 2) + )2 - ■ 
Appendix B 

EPI FOR NON-I.I.D. RANDOM VARIABLES 

Proof of Lemma |6f Let X and Y be two independent 
random variables with probability distribution p and q. Similar 
to the proof of Lemma [T] there is a binary random variable 
B, ¥(B = 0) = x and a random variable R independent of B 
such that X = BR, where X is a suitably shifted version of X 



such that ¥(X = Q)=x. Also, H[X) = h 2 (x) + (l-x)H(R). 
Then, we get 

H(p*q) =H(X + Y) 

= H(X +Y) = H(BR + Y) 

> H(BR + Y\B) 

> ¥(B = 0)H(Y) + ¥(B = 1)H(R + Y) 

> xd+(l-x)H(R) 
= xd + c — h 2 (x), 

which implies that H{p*q) — c > xd — h 2 (x). By symmetry, 
we also obtain that H(p-kq) — d > yc — h 2 (y). Combining 
these two results we get 

2H(p * q) — c — d > dx — h 2 (x) + cy — h 2 (y). ^ 

Proof of Lemma |7j Let o.j = p((— oo,m]), a 2 — 1 — «i, 
Pi = <?((— °o, n]) and (3 2 = 1— ft. Note that p = a.\P\ +a 2 p 2 
and q = ftgi + ft<72- Thus we obtain 

\\q*P\ - q*P2,\\\ + \\p*qi -p*?2||i 

> |k*Pl - 9*P2 + P*?1 -P*?2||l 

= || [ax + ft)pi *qi + (ft - ai)pi * q 2 
+ ("2 - ft)P2*?l - («2 + ft)p2 *g 2 ||i 

> || (a-i + ft)pi *Qi - (a 2 + ft)p 2 *<? 2 ||i 
- ||(ft - a 1 )p 1 *q 2 + (a 2 - ft)p 2 Ml 111 

> ai + ft + a 2 + ft - |ft - ai| - |a 2 - ft| 
= 2(1 -|l-(ai + ft)|), 

where we used the triangle inequality and the fact that p\ * qi 
and p 2 -k q 2 have non-overlapping supports. Now, two cases 
can happen: if a-y + /3i < 1 then (1 — |1 — (pt\ + /?i)|) = 
(«i + /3i) > (a + /3). Otherwise, ai + ft > 1 and we obtain 

(l-|l-(ai+ft)|)=2-(a 1 +/3i) 

= a 2 + p 2 > a + /3. 

Therefore, in both cases we get 

||g*Pi - g*P2||i + ||p*?i -p*Q2\\i > 2(a + /8), 

which is the desired result. ■ 

Proof of Lemma |8} Let a± := p((— oo,m]), a 2 := 
1 — ai, ^i := pi * g, z^ 2 := p 2 * q, and for .t G [0, 1], let 
/i x := xvi + (1 — x)v 2 and fix) ■= H(fi x ). By an argument 
similar to what we had in the proof of Lemma |4j we can show 
that 



which implies that 



H(p*q) -d> 



\q*Pi - g*P2||i- 



21n(2)' 

The other inequality in the lemma follows by symmetry. 



Proof of Lemma |9j As ||p||oo = x, JM|oo — 2/> setting 
a = i^ 2 and (3 = and using Lemma 8 we obtain 



2H{p-kq) -c-d> 



a 2 a 2 + (3 2 b 2 

21n(2) 
(1 - xfa 2 + (1 - y) 2 b 2 
81n(2) 



To prove the claim, let e > and assume that A t and B e 
are subsets of Z of minimal size such that po(A e ) > 1 — e/2 
and pi(B e ) > 1 — e/2. In particular, for any i £ A e ,j £ B c , 
Po(i) > 0,Pi(j) > 0. Moreover, 



where a — \\q*pi — q*P2\\i an d b — \\p*qi — p*Q2\\i. Also, 
from Lemma [7] we have 



i € B e }, to 



a + b > 2(a + f3) = 2 - x - y. 



(6) 



Furthermore, applying Lemma [3] to the distribution p with 
IIpIIoo = x and 91,(72 with disjoint supports, and similarly 
to q with 1 1 g 1 1 oo = y and pi,p2 with disjoint supports, we get 



Therefore, 



b > (4x-2)+,a> (4y-2) H 



2if(p -kq) — c — d> l(x, y), 



(7) 



where 



(1 - ir) 2 a 2 + (1 - y) 2 b 2 
81n(2) 



l(x,y) = min 

W)6T(*,») 



P(Jf £i ( U B e ) > a Po (A € ) + (1 - a)pi(S e ) 
For n £ Z + , let us define Be = {i + 

(n) 

be the right shift of B t by n. Also assume that p\ is the 
probability distribution shifted to the right by n, namely, for 
k € Z, p^^fc) = pi(fc — u). Specially, this implies that 

pW(BW)= Pl (B e ). 

Now let us replace pi, by p\ and let us the denote the 
resulting random variable by X. This assumption does not 
change H(X\Y) and H(X + X'\Y,Y'). As A e and B e are 
finite sets, there is Ni such that for all n > N\ , the two 
sets A e and B^ are disjoint. For a £ A e and b £ B e , let us 
compute the conditional distribution of Y given X = a and 
X = 6- 



n£ Bi n) . We have 



and T(x, y) is defined by the three inequalities derived in (|6]) 
and 0. 

The continuity of l(x, y) can be easily checked. For the last 
part of the lemma, notice that if M := xV y < 1 then it is 
not difficult to show that 

(1-A/) 2 



p(y = OjX = a) = 
P(y = 1|X = 6 + n) = 



ap (a) 



apo(a) + (1 — a)pi(a — n) ' 

(l-a)pi(6) 
(1 - a)pi{b) + ap (b + n) 



l(x,y) > 



a+b>2-2M 81n(2) 



which is strictly positive. Moreover, if x V J/ = 1 but (x, y) ^ 
(1,1) then, for example, y £ [0, l),x = 1, which implies 



It is not difficult to see that for all a £ A e and all b £ B e , both 
of these numbers converge to 1 as n goes to infinity which 
implies that both H(Y\X = a) and H(Y\X = b) converge to 
0. In particular, there is an N2 such that for n > N2 these two 
numbers are less than |. Therefore, for n > max{A^i, JVa} 
we have 



a-yy 



that b > 2. Therefore, we get l(x,y) > 2 ln( - 2 ^ 
strictly positive unless y = 1. A similar argument applies to 
ac £ [0,1), y = 1. Therefore, over (x,y) £ [0,1] x [0,1], 
K x i V) — anc l ^( x j 2/) = if and only if (x, y) = (1, 1). ■ 

Appendix C 
Conditional EPI 

Proof of Lemma [TTJ To prove the lemma, notice that 
we have the constraint H(X\Y) = H(X'\Y') = c and 
the probability distribution of Y, Y' has a support of size 2. 
We first prove that it is possible to modify the conditional 
distribution of the random variables X and X' given Y 
and Y' in a way that none of the constraints are violated, 
H(X+X'\Y, y') remains fixed and simultaneously, H(Y\X) 
and H(Y'\X') become as small as we want. To show this , let 
Pi,p'j, i,j £ {0,1} be the distribution of X,X' conditioned 
on y = i, Y' = j. Notice that if we shift any Pi,p'j to the right 
or to the left by as many steps as we want, the conditional 
entropies remain unchanged so does H(X + X'\Y, Y'). We 
claim that by suitable shift of distributions, it is possible to 
make H(Y\X) as small as we want. The same is true for 
H(Y'\X'). 



, which is H n (Y\X) = J2Px(k)H{Y\X = k) 



keA e UBr 



Px(k) x 



keA e VB e 



k<£A<,UBi n> 



which proves the claim. Now assume that we have selected 
(X,Y), (X',Y') such that H(Y\X), H{Y'\X') < e for some 
positive small number e. Then we have 



H(X + X'\Y,Y') - 


- c 






= H(X + X') 


-H(X) 


- I(X + X'\Y,Y') 


-I(X;Y) 


> H(X + X') 


-H(X) 


- H(Y,Y')+H(Y) 


-H(Y\X) 


> H{X + X') 


-H(X) 


-H{Y,Y') + H(Y) 


— € 


> H(X + X') 


-H(X) 


-H{Y')-e 





>g(H(X),H(X'))-h 2 (/3)-e 
> g(c, c) - h 2 (/3) - e, 

where we used the independence of Y, Y', increasing property 
of g and the fact that H(X) > H(X\Y) = c and similarly 



H(X') > c. As this is true for any e > 0, we obtain 

H(X + X'\Y, Y')-c> g{c, c) - h 2 {/3). 
By symmetry, we also have 

H(X + X'\Y, Y')~c> g(c, c) - h 2 (a). 
Therefore, we get the desired result 

H(X + X'\Y,Y') - c > g(c,c) - mm{h 2 (a), h 2 ((3)}. 
Proof of Lemma [l2| Assuming the hypotheses of Lemma 



11 there must be i, j € {0, 1} such that H(p i ),H{p' j ) > 
Therefore, we have 

H(X + X'\Y,Y') - c 

= 2^ <lk<Ii{H(Pk*Pi) 2 J 

fc,i=0 

> q l q ] {H(p l *p'j) J -) 

>5 2 g(c,c). 



